1. Field of the Invention
The present invention relates to a form recognizing apparatus, method, program and storage medium for automatically recognizing forms.
2. Description of the Related Art
Form recognition for automatically categorizing forms into registered formats is a significantly effective method for performing processes of inputting and categorizing a large amount of forms.
In the inputting and categorizing processes, a feature amount is extracted from data of a form image read through a scanner, for example, and form format data is created. Then, a similarity in format data is calculated between the input form and registered forms, and the registered form having the highest similarity is output as the recognition result.
A similarity in format data may be determined by referring to tables in forms and calculating a similarity value based on a ratio of a form area on one page to a total of areas of tables included on the page as disclosed in Japanese Patent Laid-Open No. 2000-285187, for example.
However, in the form recognition processing as disclosed in Japanese Patent Laid-Open No. 2000-285187, a subject is limited to a one-page form, which may cause an inconvenience.
On the other hand, multiple forms may be continuously read as disclosed in Japanese Patent Laid-Open No. 10-269311, for example. According to Japanese Patent Laid-Open No. 10-269311, a partition form is provided in advance between processing units to be processed at one time. Then, when the form group is read and every time the partition form is read, the output of the subsequent reading result is changed. Furthermore, the form only immediately after the partition form is used to identify the format type.
However, in the recognition processing as disclosed in Japanese Patent Laid-Open No. 10-269311, a partition form must be created and be provided in advance between processing units, which complicates the processing. Furthermore, since the form only immediately after a partition form is used to identify the format, it is highly likely that misrecognition may occur.